Search CORE

19 research outputs found

Hypergraph-partitioning-based remapping models for image-space-parallel direct volume rendering of unstructured grids

Author: Aykanat C.
Cambazoglu B.B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

In this work, image-space-parallel direct volume rendering (DVR) of unstructured grids is investigated for distributed-memory architectures. A hypergraph-partitioning-based model is proposed for the adaptive screen partitioning problem in this context. The proposed model aims to balance the rendering loads of processors while trying to minimize the amount of data replication. In the parallel DVR framework we adopted, each data primitive is statically owned by its home processor, which is responsible from replicating its primitives on other processors. Two appropriate remapping models are proposed by enhancing the above model for use within this framework. These two remapping models aim to minimize the total volume of communication in data replication while balancing the rendering loads of processors. Based on the proposed models, a parallel DVR algorithm is developed. The experiments conducted on a PC cluster show that the proposed remapping models achieve better speedup values compared to the remapping models previously suggested for image-space-parallel DVR. © 2007 IEEE

Bilkent University Institutional Repository

Data-parallel web crawling models

Author: Aykanat C.
Cambazoglu B.B.
Turk A.
Publication venue
Publication date: 01/01/2004
Field of study

The need to quickly locate, gather, and store the vast amount of material in the Web necessitates parallel computing. In this paper, we propose two models, based on multi-constraint graph-partitioning, for efficient data-parallel Web crawling. The models aim to balance the amount of data downloaded and stored by each processor as well as balancing the number of page requests made by the processors. The models also minimize the total volume of communication during the link exchange between the processors. To evaluate the performance of the models, experimental results are presented on a sample Web repository containing around 915,000 pages. © Springer-Verlag 2004

Crossref

Bilkent University Institutional Repository

Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices

Author: Aykanat C.
Cambazoglu B.B.
Uçar B.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

K-way hypergraph partitioning has an ever-growing use in parallelization of scientific computing applications. We claim that hypergraph partitioning with multiple constraints and fixed vertices should be implemented using direct K-way refinement, instead of the widely adopted recursive bisection paradigm. Our arguments are based on the fact that recursive-bisection-based partitioning algorithms perform considerably worse when used in the multiple constraint and fixed vertex formulations. We discuss possible reasons for this performance degradation. We describe a careful implementation of a multi-level direct K-way hypergraph partitioning algorithm, which performs better than a well-known recursive-bisection-based partitioning algorithm in hypergraph partitioning with multiple constraints and fixed vertices. We also experimentally show that the proposed algorithm is effective in standard hypergraph partitioning. © 2007 Elsevier Inc. All rights reserved

Bilkent University Institutional Repository

Incorporating the surfing behavior of web users into PageRank

Author: Ashyralyyev S.
Aykanat C.
Cambazoglu B.B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

In large-scale commercial web search engines, estimating the importance of a web page is a crucial ingredient in ranking web search results. So far, to assess the importance of web pages, two different types of feedback have been taken into account, independent of each other: the feedback obtained from the hyperlink structure among the web pages (e.g., PageRank) or the web browsing patterns of users (e.g., BrowseRank). Unfortunately, both types of feedback have certain drawbacks. While the former lacks the user preferences and is vulnerable to malicious intent, the latter suffers from sparsity and hence low web coverage. In this work, we combine these two types of feedback under a hybrid page ranking model in order to alleviate the above-mentioned drawbacks. Our empirical results indicate that the proposed model leads to better estimation of page importance according to an evaluation metric that relies on user click feedback obtained from web search query logs. We conduct all of our experiments in a realistic setting, using a very large scale web page collection (around 6.5 billion web pages) and web browsing data (around two billion web page visits). Copyright is held by the owner/author(s)

Crossref

Bilkent University Institutional Repository

Hypergraph-theoretic partitioning models for parallel web crawling

Author: Aykanat C.
Cambazoglu B.B.
Turk A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Parallel web crawling is an important technique employed by large-scale search engines for content acquisition. A commonly used inter-processor coordination scheme in parallel crawling systems is the link exchange scheme, where discovered links are communicated between processors. This scheme can attain the coverage and quality level of a serial crawler while avoiding redundant crawling of pages by different processors. The main problem in the exchange scheme is the high inter-processor communication overhead. In this work, we propose a hypergraph model that reduces the communication overhead associated with link exchange operations in parallel web crawling systems by intelligent assignment of sites to processors. Our hypergraph model can correctly capture and minimize the number of network messages exchanged between crawlers. We evaluate the performance of our models on four benchmark datasets. Compared to the traditional hash-based assignment approach, significant performance improvements are observed in reducing the inter-processor communication overhead. © 2012 Springer-Verlag London Limited

Bilkent University Institutional Repository

A web-site-based partitioning technique for reducing preprocessing overhead of parallel PageRank computation

Author: Aykanat C.
Cambazoglu B.B.
Cevahir A.
Turk A.
Publication venue
Publication date: 01/01/2007
Field of study

A power method formulation, which efficiently handles the problem of dangling pages, is investigated for parallelization of PageRank computation. Hypergraph-partitioning-based sparse matrix partitioning methods can be successfully used for efficient parallelization. However, the preprocessing overhead due to hypergraph partitioning, which must be repeated often due to the evolving nature of the Web, is quite significant compared to the duration of the PageRank computation. To alleviate this problem, we utilize the information that sites form a natural clustering on pages to propose a site-based hypergraph-partitioning technique, which does not degrade the quality of the parallelization. We also propose an efficient parallelization scheme for matrix-vector multiplies in order to avoid possible communication due to the pages without in-links. Experimental results on realistic datasets validate the effectiveness of the proposed models. © Springer-Verlag Berlin Heidelberg 2007

Bilkent University Institutional Repository

A machine learning approach for result caching in web search engines

Author: Aykanat C.
Baeza-Yates R.
Cambazoglu B.B.
Kucukyilmaz T.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

A commonly used technique for improving search engine performance is result caching. In result caching, precomputed results (e.g., URLs and snippets of best matching pages) of certain queries are stored in a fast-access storage. The future occurrences of a query whose results are already stored in the cache can be directly served by the result cache, eliminating the need to process the query using costly computing resources. Although other performance metrics are possible, the main performance metric for evaluating the success of a result cache is hit rate. In this work, we present a machine learning approach to improve the hit rate of a result cache by facilitating a large number of features extracted from search engine query logs. We then apply the proposed machine learning approach to static, dynamic, and static-dynamic caching. Compared to the previous methods in the literature, the proposed approach improves the hit rate of the result cache up to 0.66%, which corresponds to 9.60% of the potential room for improvement. © 2017 Elsevier Lt

Bilkent University Institutional Repository

A large-scale sentiment analysis for Yahoo! Answers

Author: Cambazoglu B.B.
Ferhatosmanoglu H.
Kucuktunc O.
Weber I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Sentiment extraction from online web documents has recently been an active research topic due to its potential use in commercial applications. By sentiment analysis, we refer to the problem of assigning a quantitative positive/negative mood to a short bit of text. Most studies in this area are limited to the identification of sentiments and do not investigate the interplay between sentiments and other factors. In this work, we use a sentiment extraction tool to investigate the influence of factors such as gender, age, education level, the topic at hand, or even the time of the day on sentiments in the context of a large online question answering site. We start our analysis by looking at direct correlations, e.g., we observe more positive sentiments on weekends, very neutral ones in the Science & Mathematics topic, a trend for younger people to express stronger sentiments, or people in military bases to ask the most neutral questions. We then extend this basic analysis by investigating how properties of the (asker, answerer) pair affect the sentiment present in the answer. Among other things, we observe a dependence on the pairing of some inferred attributes estimated by a user's ZIP code. We also show that the best answers differ in their sentiments from other answers, e.g., in the Business & Finance topic, best answers tend to have a more neutral sentiment than other answers. Finally, we report results for the task of predicting the attitude that a question will provoke in answers. We believe that understanding factors influencing the mood of users is not only interesting from a sociological point of view, but also has applications in advertising, recommendation, and search. Copyright 2012 ACM

Bilkent University Institutional Repository

A financial cost metric for result caching

Author: Altingovde I.S.
Cambazoglu B.B.
Ozcan R.
Sazoglu F.B.
Ulusoy Ö.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Web search engines cache results of frequent and/or recent queries. Result caching strategies can be evaluated using different metrics, hit rate being the most well-known. Recent works take the processing overhead of queries into account when evaluating the performance of result caching strategies and propose cost-aware caching strategies. In this paper, we propose a financial cost metric that goes one step beyond and takes also the hourly electricity prices into account when computing the cost. We evaluate the most well-known static, dynamic, and hybrid result caching strategies under this new metric. Moreover, we propose a financial-cost-aware version of the well-known LRU strategy and show that it outperforms the original LRU strategy in terms of the financial cost metric. Copyright © 2013 ACM

Bilkent University Institutional Repository

Effect of inverted index partitioning schemes on performance of query processing in parallel text retrieval systems

Author: B.B. Cambazoglu
B.S. Jeong
R. Baeza-Yates
W.B. Croft
Publication venue: Springer Verlag
Publication date: 01/01/2006
Field of study

Shared-nothing, parallel text retrieval systems require an inverted index, representing a document collection, to be partitioned among a number of processors. In general, the index can be partitioned based on either the terms or documents in the collection, and the way the partitioning is done greatly affects the query processing performance of the parallel system. In this work, we investigate the effect of these two index partitioning schemes on query processing. We conduct experiments on a 32-node PC cluster, considering the case where index is completely stored in disk. Performance results are reported for a large (30 GB) document collection using an MPI-based parallel query processing implementation. © Springer-Verlag Berlin Heidelberg 2006

Crossref

Bilkent University Institutional Repository